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Abstract — Evaluating the channel capacity is one of many key 
problems in information theory. In this work we derive rather-mild 
sufficient conditions under which the capacity of continuous channels 
is finite and achievable. 

These conditions are derived for generic, memoryless and possibly 
non-linear additive noise channels. The results are based on a novel 
sufficient condition that guarantees the convergence of differential 
entropies under point-wise convergence of probability density func¬ 
tions. 

Perhaps surprisingly, the finiteness of channel capacity holds for 
the majority of setups, including those where inputs and outputs have 
possibly infinite second-moments. 

I. Introduction 

Over continuous-alphabets channels, a common belief is 
that with “sufficient” power, one is capable of transmitting 
at arbitrarily large rates. Stated differently, if an input of 
infinite power is allowed, the channel capacity is infinite. This 
belief is perhaps inspired from the well-known Additive White 
Gaussian Noise (AWGN) and linear Gaussian channels for 
example. 

However, recent studies have suggested that for some chan¬ 
nels this is not true: even if an infinite power input is allowed, 
the achievable rates are not arbitrarily large: 

• In [1], the authors studied a linear additive-noise channel 
where the noise is heavy-tailed -modeled using alpha- 
stable statistics. They showed that even if the input 
constraint does allow for an infinite-power input, the 
channel capacity is finite. Actually, the authors found the 
optimal input to be surprisingly of finite power. 

• In [2], the authors studied an additive Cauchy-distributed 
noise, and the constraints did allow as well for infinite- 
power input signals. The capacity was proven to be finite 
despite the fact that the optimal input was found in this 
case to have infinite power. 

The natural question that arises is: “under which conditions 
does one have a finite channel capacity?”, the answer to which 
does clearly depend on the input constraints, but also on the 
noise statistics. In this work, we study the interaction between 
the input constraints, the input-output relationship and the 
noise distribution and derive conditions on the triplet under 
which the channel capacity is finite. 

This guarantee of finiteness is of high significance as it is 
typically the first step one would undertake in order to quantify 
the capacity of a channel at hand. Consider for example an 
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additive Gaussian noise channel where the output Y is related 
to the input X as follows: 

Y = X + N, (1) 

and where X is independent of the noise N. If no constraints 
are imposed on X, arbitrarily large transmission rates are 
achievable. If a second moment constraint is imposed instead, 
the capacity is finite. What if a “weaker” constraint is imposed 
on X. Could the rates be arbitrary large? For illustrative pur¬ 
poses, consider the “weaker” constraint E [in 2 (1 + |X|)] < A 
for some A > 0. This channel (jT]i is equivalent to the channel: 

Y = sgn (U) (e |c/| — l) + N 

where U now is average power constrained E [ U 2 ] < A. At 
a first look, it is not clear whether the capacity of such a 
channel is finite or not. Indeed, in some sense the channel is 
“exponentially amplifying” the input and by more than what 
the cost is constraining it. An appropriately-chosen Cauchy 
distributed input X will satisfy the constraint but will have an 
infinite second moment. The average of Y 2 will be infinite as 
well. Is the capacity of this channel finite? Our result provides 
an unexpected positive answer to this question. 

Theoretical interests aside, it may seem unusual in a 
Gaussian setup to impose the constraint E [in 2 (1 + |A'|) 
or any other type of input constraints that permits E [A' 2 
to be infinite. However, when the channel model features 
noise distributions having an infinite second moment, as in 
the case of some channels subject to multiple access [3] or 
radio-frequency [4] interference, imposing a second moment 
constraint becomes less sensible; such a constraint masks the 
characterization of the behaviour of the transmission rates 
function of the quality of the channel since the channel signal- 
to-noise ratio will constantly evaluate to zero. Furthermore, we 
note that the usage of constraints allowing the input to have an 
infinite second moment has been previously examined within 
the context of robust estimation and detection theory [5]—[7]. 

More formally, the notion of capacity of a discrete memory¬ 
less channel was defined in the early works of Shannon [8], [9] 
to be “the largest” rate at which one can communicate over a 
channel with an arbitrarily low probability of error. Through a 
coding theorem, Shannon proved that the capacity is given by 
the solution to an optimization problem, whereby the mutual 
information between the input and output of the channel is 
maximized. When it comes to continuous channels the inputs 
of which are potentially constrained, the results were extended 
(see for example [9]—[11]) and the channel capacity was also 
tied to a constrained optimization problem. 
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Naturally, in both setups it is implicitly assumed that the 
optimization problem is “well-defined”, for otherwise relating 
the channel capacity to a solution of a maximization of 
mutual information is problematic. In this work we tackle 
this assumption and provide a sufficient condition for such 
an optimization problem to be both well-defined and yielding 
a finite and achievable solution for a wide range of channels. 

We consider a generic average-constrained channel model 
where the noise is additive and absolutely continuous. We 
prove in Section [ill] that under very mild conditions on the 
noise and the constraint, the channel capacity is indeed finite 
and achievable. 

We start by deriving sufficient conditions that ensure that 
mutual information is finite -and hence well-defined- and we 
make use of the extreme value principle [12] to ensure that the 
maximization problem yields a finite and achievable solution. 
This could be achieved by enforcing two characteristics: 

1- The input space of feasible distribution functions is 
compact. 

2- The mutual information between the input and the output 
of the channel is continuous in the input distribution 
function. 

We emphasize that these two properties are intimately 
related to the channel model and the input constraints if any. 

The generic model adopted in this work encompasses 
multiple channel models found in the literature: We con¬ 
sider input-output relationships that are possibly non-linear; 
A generic average cost function C(-) is imposed on the input; 
The absolutely continuous additive noise has a finite “super- 
logarithmic moment’)]] as is the case for Gaussian, uniform, 
generalized Gaussian, generalized t, Pareto, Gamma, alpha- 
stable distributions, and their mixtures. We show that whenever 
the input cost function has a “super-logarithmic growth’)]], the 
channel capacity is finite and achievable. 

Establishing the continuity of mutual information under 
any “super-logarithmic” input constraint is achieved using 
a novel result on the convergence of differential entropies. 
While numerous studies have tackled this subject (see for 
example [13], [14]), the conditions presented in Section [TT| 
are among the weakest that insure this convergence whenever 
Probability Density Functions (PDFs) converge point-wise. 

The rest of the paper is organized as follows: In Section HI1 
a preliminary theorem concerning the convergence of differ¬ 
ential entropies is listed and proved. The primary problem and 
the main result are presented in Section [III] where we describe 
the channel model and state the conditions under which our 
result holds. Technical proofs are derived in Section[lV] Some 
extensions are listed in Section [V] and Section [VI] concludes 
the paper. 


'A "super-logarithmic moment" is an expectation of the form E [./'(|A"|)] 
for some function /(|a:|) = ut(ln(|ir|)). 

We say that f(x) = u> if and only if Vre > 0, 3c > 0 such that 

f(x) > Kg(x),Vx > c. 

tWe say that a function fix) has a "super-logarithmic growth” whenever 
f{\x\) = w(ln(|x|)). 


II. Convergence of differential entropies 

In this section we establish a sufficient condition for the con¬ 
vergence of differential entropies whenever there is point-wise 
convergence of the corresponding PDFs. More precisely, we 
prove a theorem that guarantees this convergence under some 
rather-mild sufficient conditions. In layman terms, this theorem 
states that whenever the PDFs satisfy a super-logarithmic type 
of moment, point-wise convergence will imply convergence of 
differential entropies. We emphasize that the new conditions 
are weaker than some of those derived by Godavarti et al. [14, 
Thm 1], Alternative conditions found in [14, Thm 4] are not 
directly related to those presented hereafter. 

Theorem 1. Let the sequence of PDFs on R, {pm(y)}m >t 
and p(y) satisfy the following conditions: 

Cl- The PDFs {pm{y)}m an d p{y) are uniformly upper- 
bounded: 

3 M G (0, oo) s.t. sup \pm{y),p(y) \ < M. (2) 

y£R,m> 1 t J 

C2- There exists a non-negative and non-decreasing function 
l : [0, oo) —» [0, oo), such that l(y) = uj (ln(t/)) ' and 

sup|E Pm [((|F|)],E p [Z(|F|)]J<L, (3) 

for some positive (finite) value L. 

Under these conditions, h(p m ) —> h(p) whenever the PDFs 
Pm(y) -»■ p(y) point-wise. 

Before we prove the theorem, we highlight the importance 
of condition C2 by providing an example where it is not 
satisfied, and the theorem does not hold. 


Example 1. Consider the sequence of PDFs {p m (x)}m >3 
defined on R as follows: 


Pm{x) = 



In to 
1 1 


(In to) 2 x 


x G [0; 1] 
x G (1; to]. 


This sequence of PDFs converges point-wise to p(x), the 
uniform distribution on [0,1], and condition Cl is satisfied 
with a uniform upperbound M = 1. Computing the differential 
entropies, 


h(p) =0 

h{p m ) = - 


1 


n to) : 
1 


In to 

2 ln(lnm) 

T 

i - 


In 1 - 


In to 


■ dx - 


In: 


(In to) 2 


dx 


In 


TO 


In 1 - 


Ini 


2 ln(ln m) 1 
In m 2 


as to 


oo, 


and hence there is no convergence of differential entropies. 
This is explained by the fact that condition C2 is not satisfied. 
Indeed, consider any function l(x) that is non-negative, non¬ 
decreasing and l(x) = u}(\nx). By definition, for any k > 
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0, there exists a c > 0 such that l(x) > relnx for x > c. 
Therefore, for any m > c, 


E„„ [1(|A'D] 


-( 1- 


1 


lnm 

1 

lnm 


> 1 - 


ln i 


= 1 - 


lnm 


l(x) dx + 


1 


(lnm) 2 

1 


— l(x) dx 


I l(x) dx + " [ - 

I o (lnm) 2 J 1 x 


l(x) dx + 


l(x) dx + 


1 


(lnm) 2 


1 


(lnm) 2 Ji x 

K 


l(x ) dx 

[ m 1 

Jc x 

l(x) dx 


l(x) dx 


(lnm) 2 


— In xdx 
x 


1 


— l(x) dx 


(lnm) 2 J 1 x 

(lnm) 2 — (Inc) 2 
~ 2(lnm) 2 


> K 


(lnm) 2 — (Inc) 2 
2(lnm) 2 


which is greater than |re whenever m > c 2 . Since the 
inequality holds for any re > 0 and m large enough then 

sup m -|Ep m [Z(|X|)]J is unbounded which violates condition 
C2. We proceed next to the proof of Theorem [H 

Proof: We start by noting that the differential entropies 
h(p) and {h(p m )} m >i exist and are finite by virtue of the fact 
that the PDFs are upperbounded and have a finite logarithmic 
moment [15, Proposition 1]. 

Assume now that the conditions of the theorem hold and 
that p m converges to p point-wise. If the upperbound © M 
is larger than one, consider the change of variables, Z = MY 
(for which h(Z) = h(Y ) + In M,) or equivalently the PDFs, 

d(v) = :h p (ii)- = 

These densities are upperbounded by one and the sequence 
{d m (y)} converges point-wise to d(y). Furthermore, the func¬ 
tion l'{y) = l(y/M) is non-negative, non-decreasing and 
l'(y) = w(ln(y)). Additionally, 

E dm [l'(\Y\)] = E Pm [l\\MY\)\ = E Pm [l(\Y\)} < L. 


y > 0, we can write 


- J p(y)^p(y)dy 

\v\>v 

= “/ P(y)^q(y)dy+ j q(y)^\n^dy 
\v\>v \v\>y 

< In 7T J p(y) dy+ J In [l + y 2 ] p(y) dy 
\v\>y \y\>y 

+ ~ J 9 ( 2 /) d y 

\y\>y 

~W)J l ^ p ^ dy+ J ln t 1 + y 2 ] p(y) d y 

\y\>v \y\>y 

+ etatiTFi / 1,1 1 1 + yl ] i ,y > dy - (4 > 

\y\>y 

where equation (0]) is due to the fact that /(•) is non-decreasing. 
Hence, 


p(y) ln p(y) dy 


\y\>y 

< ln: 


e p tuiy|)] 

Ky) 


Linn 

< —I- 2 sup i 


+ 2 J In [1 + |y|] p(y) dy 
\y\>y 

. l E,[ln[l + V 2 ]] 

e ln [1 + y 2 ] 

/ In [1 + | y 


Ky) 


M>a H\V\) 


i(y)p(y) d y 


+ 


\y\>y 

1 ln4 


e ln [1 + y 2 ] 


Linn 

< —h 2 L sup 


Ky) 


f In [1 + |y|] 'l 

iSl i(l»l) / 


1 ln4 


e ln [1 + y 2 ] 1 


(5) 


( 6 ) 

(7) 


where equation (0 is justified since l{y) is positive and l(y) is 
non-negative. In order to write equation 0 we use the identity 
E g [ln (l -F J/ 2 )] = In4 [16, Sec.3.1.3, p.51 ]. The supremum 
in equations 0 and 0 is finite -and goes to 0- for y large- 
enough because l{y) = u> (ln y). 

Since the upperbound 0 also holds for any p m (y), then 
for every 6 > 0, there exists a y > 0 such that for all m > 1: 


/ Pm{y) lnp m {y) dy 

< 5 & 

/ P{y) In p{y) dy 

J 

\y\>y 


J 

\y\>v 


The conditions of the theorem therefore hold for the laws 
{d m ,d} and in what follows we assume without loss of 
generality that M < 1, and the differential entropies are all 
non-negative . 

Let y be any positive scalar such that l(y) > 0, and 
denote by q{y) = ~ the Cauchy density. Then, using 
the convention “0 ln 0 = 0” and the fact that y ln y > — | for 


It remains to show that 


lim 

m—>■+oo 


- J Pm(y) lnp m ( y)dy 

\y\<y 


- J p{y) In p(y)dy, 

\y\<y 


which is guaranteed by the Dominated Convergence Theorem 
(DCT) since \p m {y) Inp m {y)\ < i by virtue of the fact that 
Pm (y) < 1 for all m, which completes the proof. ■ 
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III. Sufficient conditions for finiteness of 

CHANNEL CAPACITY 

In what follows we derive sufficient conditions for a mem¬ 
oryless additive-noise channel to have a finite and achievable 
capacity. More specifically, we consider a generic discrete-time 
real and memoryless noisy communication channel where the 
noise is additive and where the input and output are possibly 
non-linearly related as follows: 

Y = f(X) + N, (8) 

where Y G R is the channel output and where the input X 
is assumed to have an alphabet f Cl. The channel’s input 
is distorted according to the deterministic and possibly non¬ 
linear function /( x). Additionally, the communication channel 
is subjected to an additive noise -that is independent of the 
input- that is absolutely continuous with PDF pn(-)- 

Finally, we assume that the input is subject to an average 
cost constraint of the form: E[C(|.Xj)] < A, for some A G 
(0, oo) and where C(-) is some cost function: 

C : [0, oo) —*• R. 


Accordingly, we define for A > 0 

Va = jProb. distributions F : j C (|x|) dF( x) < a|, (9) 

the set of all distribution functions satisfying the average cost 
constraint. 

The primary question that we would like to answer is 
whether one can reliably transmit an arbitrarily large number 
of bits per use over this channel. Said differently, are the 
achievable rates over this channel bounded? The answer to this 
question follows from those of the following two questions: 

• Is the mutual information between a feasible input and 
the corresponding output always finite? 

• If it is the case, can this mutual information be arbitrarily 
large? 

A positive answer to the first question allows by the coding 
theorem [17] to state that the channel capacity is the supremum 
of the mutual information /(•) between the input X and output 
Y over all input probability distributions F that meet the 
constraint Va'- 

C= sup 1(F). 

F&Va 

For the channel at hand ®, we note that the channel 
transition probability law is absolutely continuous with density 
function given by 


Py\x(v\x) = Pn(u ~ f(x)), y G R, x € X. (10) 
and the mutual information may be expressed as [11] 

Pn (y - /Or))' 


I( F ) = PN (y~ f Or)) In 


p{y;F) 


dydF(x) 


( 11 ) 


where p(y; F) = J Pn (y ~ /( x)) dF(x) is the output PDF. 


Sufficient conditions 

We make the following rather-mild assumptions: 

• The cost function C(-): 

Al- The cost function is lower semi-continuous. 

A2- The cost function is non-decreasing. 

A3- C(\x\) = uj (In \f(x)\). 

Without loss of generality, one may assume that C(-) is 
non-negative. For if it were not, define C (1*1) = c(M)- 
C(0) and adjust the input constraint accordingly. 

• The function /(•): 

A4- The function is continuous. 

A5- The absolute value of the function |/(-)| is an non¬ 
decreasing function of |a;| and \f(x)\ —» +oo as 

|.t| —> Too. 

• The noise PDF pn(-)- 

A6- The PDF is continuous on M. 

A7- The PDF is upperbounded. 

A8- There exits a non-decreasing function 

Cn ■ [0, oo) —> R, 

such that Cn (|n|) = uj (In |n|), and 

Eat [Cn (|AT|)] = L N < oo. 

As an example, this condition holds true for any 
noise PDF whose tail is “faster” than ,, 1 . ^ . 

a;(lnx) 

Conditions A7 and A8 guarantee that the noise differen¬ 
tial entropy Yin- exists and is finite [15, Proposition 1]. 
Since from an information theoretic perspective, the gen¬ 
eral channel model © is invariant with respect to output 
scaling, we consider without loss of generality that the 
noise PDF is upperbounded by one. 

Also without loss of generality, one may assume 
that Cjv(■) is non-negative. Otherwise, one may adopt 
C'n (M) =Cjv(M) — Cn(0). 

The above assumptions are sufficient conditions on the 
triplet /(•), C(-), and pn(-) that guarantee the finiteness and 
the achievability of the capacity of channel ©: 

Theorem 2. Under conditions Al through A8, the capacity 
of the average-cost constrained channel © is finite and 
achievable. 

Furthermore, the maximum is achieved by a unique F* in 
Va if and only if the output PDF is injective in F. 

We point out that assumptions A4 through A8 are related 
to the channel model at hand and are not “conditions” per 
say. These assumptions are satisfied by the vast majority of 
common models found in the literature. 

When thinking in terms of conditions on the input - con¬ 
trolled by the user, Al, A2 and A3 are to be considered. Note 
that these conditions are also common to all cost functions 
found in the literature. While Al and A2 are rather technical, 
the relevance of A3 may be seen in the following example. 

Example 2. Consider the linear additive channel (©, where 
now the noise N is a uniformly distributed random variable 
on the interval [0,1). 
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Let Xi and X 2 be two discrete random variables taking in¬ 
teger values k >2, with respective probability mass functions: 


p Xl (k ) = Bi 


£:(ln k) 2 ’ 


Px 2 (k) = B 2 


£:(ln k) 3 ’ 


k>2, 


entropy in Py(')- This is indeed the case if and only if py(-) 
is injective in F. ■ 

The next section is dedicated to the proofs of Theorems [3] © 
and [5] 


where B\ & B 2 are the normalizing finite constants. 


B, = 


-i -i 


E 

.k—2 


k(lnk) 2 


B 2 = 


i -i 


E 

Lfc=2 


fc(ln k) 3 


Let Y\ and Y 2 be the outputs of channel (© whenever its 
inputs are X\ and X 2 respectively. Given the placement of 
the mass points, X\ may be perfectly inferred from Yi and 
H(Xi\Yi) = 0. Similarly H(X 2 \Y 2 ) = 0 and therefore the 
mutual informations 


IV. Proofs of the Theorems 

We use techniques that have been first developed in [11] and 
later adopted in various works on mutual information maxi¬ 
mization as in [18]: Denote by T the space of all probability 
distribution functions on M. We adopt weak convergence [19, 
III-1, Def.2, p.311] on 3F, and use the Levy metric to metrize 
this weak convergence [5, Th.3.3, p.25]. The optimization is 
carried out in this metric topology. 


I(Xr, Yi) = H(X 1 ) - H(Xi\Yi) = H(Xi) 
I(X 2 -Y 2 ) = H(X 2 ). 


Computing H(X\) and H(X 2 ), we obtain: 


H{Xi) = 


^ ~2pxt(k ) In p Xi (k) 
k> 2 


, D , D In fc + (1 + 0 ln(ln k) 

lnjBi + jBi E- k^hiky+i - 


* = 1 , 2 , 


which diverges for i = 1 and converges for i = 2. Accordingly, 
the mutual information of channel © is infinite when the 
input is A’i whereas it is finite for input X 2 . Note that 
E [In A’i] is infinite while E [In A 2 ] is finite, and this example 
showcases the importance of condition A3 when it comes 
to the finiteness of mutual information. Whenever A3 is not 
enforced, the channel capacity might be infinite as Ai yields 
an infinite mutual information. The theorem states that when 
the condition is enforced, the capacity will be finite. 

An interesting observation is that both E [A^] and E [Aj] 
are infinite, however as inputs to the channel they yield 
respectively an infinite and a finite mutual information. We 
proceed next to prove Theorem [2] 

Proof: The first statement of the theorem is established 
using the extreme value principle which we state for complete¬ 
ness and can be found in [12]: 


Theorem. If I (•) is a real-valued, weak continuous functional 
on a compact set Cl C F, then /(•) achieves its maximum on 


n. 


In order to apply this principle, we show in Section [TV] 
that the set Va is compact (Theorem© and that the mutual 
information 1(F) is finite and continuous (Theorems ©and©. 
Therefore, the capacity of the average-cost constrained channel 
is finite and achievable. 

When it comes to uniqueness, since Va is convex (Theo¬ 
rem© whenever /(•) is strictly concave , then the maximum 

C = max 1(F), 

F£Va 

is achieved by a unique F* in Va- 

Knowing that /(•) is concave (Theorem©, its strict concav¬ 
ity is equivalent to the strict concavity of the output differential 


Optimization set properties 

Theorem 3. Whenever conditions Al, A2, A3 and A5 are 
satisfied, the set Va defined in m is convex and compact. 


Proof: 

We note first that the theorem was shown to hold for cost 
functions of the form C (|a;|) = |x| 7 ’, for r > 1 in [1], [18]. 
We adopt the same methodologies to generalize the results 
presented hereafter. 

Convexity: Let f j and F 2 be two probability distribution 
functions in Va, and A some scalar between 0 and 1. Define 
F = AFi + (1 — AjYY It is clear that F is a probability distri¬ 
bution function because it is non-decreasing, right continuous, 
F(— 00 ) = 0 and ^(+ 00 ) = 1. Additionally, 

[ C(\x\)dF= [ C(\x\) d(XF 1 + (l-X)F 2 ) 

Jr Jr 

= A [ C (|x|) dF 1 + (1 - A) [ C (|s|) dF 2 
J r Jr 

< XA + (1- X)A = A. 

Therefore, F € Va and Va is convex. 

Compactness: Consider a random variable A with prob¬ 
ability distribution function F £ Va- Applying Markov’s 
inequality to random variable C ( |A|) yields, 

Pr{C(|A|) >a}< Va > 0. 

a 

Now let 

K = inf {a; £ [0,oo) s.t. C(x) > a} + 1, 

which is always greater or equal to 1. For any finite value of 
a, such a K exists since C(x) increases to +00 as x —» +00 
by virtue of properties A3 and A5. Additionally, since C(-) is 
non-decreasing by property A2, 

Pr{C(|A|) > a) > Pr {|A| > K- 1} > Pr{|A| > K) 
>F(-K) + (1-F(K)}. 

Hence, for all F £ Va, we obtain 

F(-K) + [1 - F(K)\ < ^ 

a a 
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Therefore, for every e > 0, there exists a K e > 0, namely 
K e = min { x £ [0, oo) s.t. C(x) > (A/e)} + 1, 
such that 

sup [F(-K e ) + [1 - F(K e )]] < e. 

F£Va 

This implies that Va is tight [19, III-2, Def.2, p.318]. By 
Phrokhorov’s Theorem [19, III-2, Th.l, p.318], Va is therefore 
relatively sequentially compact and every sequence {F n } of 
distribution functions in Va has a convergent sub-sequence 
{F n .} where the limit F* does not necessarily belong to 'Pa ■ 
If we prove that F* £ Va, the latter will be sequentially 
compact and hence compact since the space is metrizable [20, 
Th.28.2, p. 179]. In order to show that the limiting distribution 
function F* is in Va, it must satisfy the cost constraint which 
is the case. In fact. 


Before we proceed with the proof, we note that under 
the conditions of the Theorem, the mutual information 1(F) 
between the input and the output of channel <[8]) is finite by 
virtue of Theorem Q] 

Proof: 

Concavity: The output differential entropy hy(F ) is a 
concave function of F on T. In fact, 

h y (F) = - j p Y (y;F) In p Y (y;F)dy 

exists (by Theorem Eli and is a concave function of py(-) 
because — xlnx is concave in x. Since py(F ) is linear in F, 
1(F) = hy(F) — hjy is concave on Va- 


f C (|it|) dF*(u) < lim inf [ C(|w|) dF n 
J nj—toc J 


< A, 


where the first inequality holds because C(|u|) is lower semi- 
continuous (property Al), and is bounded from below by C (0) 
for all u £ R (property A2) [21, Th. 4.4.4]. In addition, the 
second inequality is valid since the sub-sequence {F n } is in 
Va and therefore satisfies the cost constraint V n :i . Finally, 
F* £ Va and Va is compact. ■ 


Properties of the mutual information, /(•) 

We prove in what follows the finiteness, concavity and 
continuity of !(•) on Va through Theorems [4] and [5] 

Theorem 4. Whenever conditions A3, A7 and A8 hold, the 
mutual information 1(F) betw’een the input and output of 
channel © is finite for all input distribution functions F such 
that E[C(|X|)] is finite. 

Proof: Since Y = f (X) + N, 

In [1 + |V|] < In [1 + |/(X)|] + In [1 + \N\] , 

and E [In [1 + |K|]] is finite because both E [In [1 + |/(X)|]] 
and E [In [1 + |iV|]] are finite (by properties A3 and A8). 

Moreover, and since py (y) is upperbounded (by A7) by one 
hy(F) = — f p(y]F) In p(y 1 F)dy, the differential entropy 
of Y, is well defined [15, Proposition 1] and 0 < h Y (y) < 
Too. 

The differential entropy h,N of the noise being finite (due 
to properties A7 and A8), the mutual information (ITTb can 
therefore be written as the difference of two terms: 

1(F) = hy(F) - hy\x(F) = hy(F) - h N , (12) 

both of which are finite and this completes the proof. ■ 

Theorem 5. Assume that conditions A2 through A8 hold. 
Under a cost constraint 

jC(\X\) dF(x) <A A > 0, 

the mutual information 1(F) betw’een the input and the output 
of channel © is concave and continuous whenever C (|x|) = 
w(ln|/(a;)|). 


Continuity: To prove the continuity of 1(F), it suffices to 
show that hy(F) is continuous by virtue of equation (fl2l> . To 
this end, we let F £ Va and let {E m } m >i be a sequence of 
probability measures in Va that converges weakly to F. 

In order to apply Theorem [j] and show the convergence of 
hy(F rn ) to hy(F) and hence the weak continuity of hy(F) on 
Va, we establish that the appropriate conditions are satisfied: 

• By definition of weak convergence, since pn(v — x) is 
bounded and continuous (properties A6 and A7), then 
p(y,F m ) = Jp N (y - f(x))dF m (x) converges point- 
wise to p(y; F) = f p N (y - f(x)) dF(x). 

• The induced output PDF p(y;F m ) is also bounded by 
one. 

• It remains to find a non-negative and non-decreasing 
function, l : [0, oo) —> [0, oo) such that l(y) = lo (ln(y)), 
and a scalar L > 0 such that equation © holds for 
p(y;F m ), m > 1 and p(y,F), a task which we fulfill 
in what follows. 

For any y > |/(0)|, let S = / -1 ([|/(0)|, y]) be the inverse 
image by /(•) of the closed interval [|/(0)|,y]. Since /(■) 
is continuous (A4), the set S is closed. It is also bounded 
because |/(x)| is non-decreasing in \x\ and tends to infinity 
(A5). Therefore any element in S is smaller than a positive t u 
such that |/(£„)| = 2 y and greater than a negative such that 
\f(tb)\ = 2 y. Such t u and 4 exist because /(•) is continuous. 

The set is compact and has a maximal value that we denote 
z(y) = max {2 : z £ S}. Note that \f(z(y))\ = y. 

Define the function C m i n (-): [0, oo) —> R as follows: 

r (,a = I min i c ( z (y)); c Jv(y)} 2/>|/(0) | 

mi" [y > \ 0 otherwise, 

where Cjv(-) is defined in A8. The function C m [ n (y) is non¬ 
negative and non-decreasing on [0, oo) since both C(y) and 
Cn(-) are non-negative and non-decreasing by properties A2 
and A8 and z(y) is non-decreasing for y > |/(0)|. Addition¬ 
ally, C min (y) = a; (In y) because C(x) = w(ln|/(x)|) (A3) 
and Cn(x) = w (lnx) (A8). 
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Now, for any X with distribution F G Va , 
'\Y\- 


c • 

'-'min 


E Y 

= Ex,x 

< Ex,AT 

= Ex,AT 
+ Ex,AT 

< Ex, AT 
+ Ex,TV 


r ■ 

'-'min 

c ■ 

'-'min 

c ■ 

'-'min 

c • 

'-'min 


\f(X) + N\ 


\.f(X)\ + \N 


\f( x )\ + 1^1 


l/WI + |iV| 


C mi „(|iV|) 


\f{X)\<\N\ 


\f(X)\<\N\ 

l/POI>W 


(13) 

P(l/WI<|iv|) 

P(|/(X)|>|1V|) 


P(|/WI<|JV|) 


(I/POI) 


|/(JXT)|>|iV| 


< Ex [C min (\N\)} + Ex [C min (|/(X)|)] 

< Ex[Cx(|iV|)] + Ex[C(|.Y|)] 

^ L 7 Y A — L oo. 


P(|/(x)|>|iv|) (14) 
(15) 


(16) 


where 0 < Lx = Ex[Cx(|-V|)] < oo by property A8. 
Equations (fl3l > and (fl4l > are justified since C m i n (|x|) is non¬ 
decreasing in |x| and (TiTT i is due to the fact that C m ; n (|x|) is 
non-negative. Since the value 0 < L < oo is independent of 
the input distribution function F £ Va, then ( 1 1 6b holds for any 
output variable Y, i.e for all p(y\ F) where F £ Va- Letting 
l(y) = C m i n (|), y £ [0, oo), then equation © is satisfied for 
p(y; F m ), m > 1 and p(y; F). Therefore, TheoremQ]holds and 
hy{Fm) converges to hy(F) and hence hy(F) is continuous 
which concludes the proof. ■ 


V. Extensions 

The results may be extended to the case where the noise 
PDF is not necessarily continuous on R. In fact, we weaken 
condition A6 and we show that Theorem [2] also holds for 
noise PDFs which are piece-wise continuous on a countable 
number of pieces. Note that under this category fall absolutely 
continuous noise variables with a compact supporjfj] such as 
the uniform, and also ones that are one-sided such as the 
Gamma or the Pareto random variables. We start by noting 
the following: 

• It can be seen from the proof of Theorem Q] that almost 
everywhere (a.e.fl point-wise convergence with respect 
to the Lebesgue measure (in addition to Cl and C2) 
is sufficient in order to have convergence of differential 
entropies. 

« According to the definition of weak convergence [22, p. 
700], one can replace continuous bounded test functions 
by F- a.e. continuous functions where F is the limit 
distribution. 

We show now that if px(') has a countable number of dis¬ 
continuities then weak convergence of the input distributions 

IT we define the support of a random variable as being the set of its points of 
increase, i.e., the set (xER: Pr(ic — r/<X<x + ri)>0 for all y > 0}. 

t%e say that a property holds almost everywhere with respect to a measure 
p and we denote it p- a.e. if and only if the measure by p of the set where 
the property fails is equal to zero. 


implies Lebesgue-a.e. point-wise convergence of the output 
PDFs: Denote by {cu}i>i the countable discontinuities of 
Pn(') and by {xi}i >i the discontinuity points of F, which 
are necessarily countable (see Jordan decomposition lemma 
in [23, p. 40]). Point-wise convergence of the PDFs holds 
except at values of y of the form y,j = a, — f(xj), i,j > 1 . 
The fact that the {y,j }’s are countable proves our assertion. 

VI. Conclusion 

Tangible models for communication channels implicitly 
assume a finite value for the channel capacity. Knowing that 
maximizing the transmission rates is directly related to a 
constrained maximization problem, we have derived sufficient 
conditions for finiteness and achievability of the capacity of 
generic memoryless additive noise channels. The involved 
conditions on the input-output relationship, the input cost 
function and the type of the noise define a wide collection 
of models for which finding codes that strive toward achiev¬ 
ing maximum transmission rates is sensible. The result is 
applicable to possibly non-linear channels, to nearly all the 
widely known additive noise models and for cost functions 
that are “super-logarithmic”. Interestingly, communications at 
finite rates is not directly related to an input average-power 
constraint. Even when signaling strategies are allowed to have 
an infinite second moment on average, transmission rates could 
not be arbitrarily large. We mention that while searching 
for sufficiency, intermediately we derived conditions under 
which point-wise convergence of PDFs implies convergence 
of differential entropies. 
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